Universal Count Correction for High-Throughput Sequencing
نویسندگان
چکیده
We show that existing RNA-seq, DNase-seq, and ChIP-seq data exhibit overdispersed per-base read count distributions that are not matched to existing computational method assumptions. To compensate for this overdispersion we introduce a nonparametric and universal method for processing per-base sequencing read count data called FIXSEQ. We demonstrate that FIXSEQ substantially improves the performance of existing RNA-seq, DNase-seq, and ChIP-seq analysis tools when compared with existing alternatives.
منابع مشابه
Reptile: representative tiling for short read error correction
MOTIVATION Error correction is critical to the success of next-generation sequencing applications, such as resequencing and de novo genome sequencing. It is especially important for high-throughput short-read sequencing, where reads are much shorter and more abundant, and errors more frequent than in traditional Sanger sequencing. Processing massive numbers of short reads with existing error co...
متن کاملEstimation and correction for GC-content bias in high throughput sequencing
GC-content bias describes the dependence between fragment count (read coverage) and GC content found in high-throughput sequencing assays, particularly the Illumina Genome Analyzer technology. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation. The bias is not consistent between samples, and curre...
متن کاملHiTEC: accurate error correction in high-throughput sequencing data
MOTIVATION High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data. RESULTS We present HiTEC, an algorithm that provides a highly accurate, robust and fully automated method t...
متن کاملمروری برتکنیک های توالی یابی DNA (نسل اول، نسل دوم و نسل سوم)
Introduction: The DNA sequencing is the most important technique in molecular biology by which the order of the nucleotides can be identified in a piece of DNA. There are several different methods for sequencing the DNA. Now, the DNA sequencing has great importance in the medical diagnostics and other medical fields. Some methods have been invented to speed up and increase the efficiency of the...
متن کاملSummarizing and correcting the GC content bias in high-throughput sequencing
GC content bias describes the dependence between fragment count (read coverage) and GC content found in Illumina sequencing data. This bias can dominate the signal of interest for analyses that focus on measuring fragment abundance within a genome, such as copy number estimation (DNA-seq). The bias is not consistent between samples; and there is no consensus as to the best methods to remove it ...
متن کامل